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A  STOCHASTIC  REALIZATION  APPROACH  TO  THE  SMOOTHING  PROBLEM* 

Far is  Badawi,  Anders  Lindquist,  and  Michele  Pavon 

Department  of  Mathematics 
University  of  Kentucky 
Lexington,  Kentucky  40506 

ABSTRACT:  The  purpose  of  this  paper  is  to  develop  a  theory  of  smoothing 
for  finite  dimensional  linear  stochastic  systems  in  the  context  of  stochastic 
realization  theory.  The  basic  idea  is  to  embed  the  given  stochastic  system 
in  a  class  of  similar  systems  all  having  the  same  output  process  and  the  same 
Kalman-Bucy  filter.  This  class  has  a  lattice  structure  with  a  smallest  and  a 
largest  element;  these  two  elements  completely  determine  the  smoothing  esti¬ 
mates.  This  approach  enables  us  to  obtain  stochastic  interpretations  of  many 
important  smoothing  formulas  and  to  explain  the  relationship  between  them. 
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1.  INTRODUCTION 

Let  (x(t);  0  £  t  s  T}  and  (y(t);  0  ^  t  n!  be  two  stochastic  vector 
processes,  of  dimensions  n  and  m  respectively,  defined  as  the  solution  of 
the  linear  system  of  stochastic  differential  equations 

fdx  =  A(t)x(t)dt  +  B(t)dw  ;  x(0)  =  £  (1.1a) 

(S)  \ 

(dy  =  C(t)x(t)dt  +  D(t)dw  ;  y(0)  =  0  (1.1b) 

where  w  is  a  vector  process,  of  dimension  p  >  m,  with  orthogonal  incre¬ 
ments  such  that 

E{dw}  =  0;  E(dwdw')  =  Idt  (1.2) 

(prime  denotes  transposition),  £  is  a  centered  random  vector  with  finite 
covariance  II  :=  E{££'}  and  uncorrelated  with  w,  R(t)  :=  D(t)D(t)  '  is 
positive  definite  on  [0,  T] ,  and  A,  B,  C,  D,  and  R  *  are  matrices  of 
analytic  functions  defined  on  [0,  T] .  The  model  S  is  usually  called  a 
linear  stochastic  system ;  y  is  its  out-put  process,  w  is  its  input  process 
and  x  its  state  process.  We  shall  assume  that  the  representation  S  is 
minimal  in  the  sense  that  there  is  no  other  model  of  the  form  (1.1)  with  the 
process  y  as  its  output  and  with  a  state  process  x  of  smaller  dimension 
than  n.  Clearly  the  matrix  function  P(t)  :=  E{x(t)x(t)'}  satisfies  the 
differential  equation 

P  =  AP  +  PA'  +  BB'  ;  P(0)  =  II  (1.3) 

on  [0,  T] .  We  shall  call  P  the  state  covariance  function  of  S. 

The  following  problem  is  of  considerable  importance  in  the  systems  sciences. 
For  an  arbitrary  t  e  [0,  T],  find  the  linear  least-squares  estimate  £(t)  of 
the  state  vector  x(t)  given  the  output  (y(x);  0  <  t  s  T) ,  i.e.,  the  wide 
sense  conditional  expectation 
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x(t)  =  E{x(t) |  y(x);  0  <  t  s  T}  (1.4) 

in  the  terminology  of  Doob  [1].  This  is  the  smoothing  problem,  and  it  has 
generated  a  rather  extensive  literature  [2-17].  (See  the  survey  paper  [18] 
for  further  references.)  Here  we  shall  study  this  problem  from  a  new  angle 
employing  concepts  and  techniques  from  the  stoohastic  realization  theory 
developed  in  [20-22]  and  more  recently  in  [23-33].  The  basic  idea  consists 
in  embedding  the  model  (1.1)  into  a  class  S  of  models  S  all  having  the 
same  process  y  as  its  output  and  all  having  the  same  Kalman-Bucy  filter. 

Such  a  representation  is  called  a  stochastic  realization  of  y.  (Note  that 
we  only  consider  proper  realizations  [20],  i.e.,  models  S  whose  outputs 
not  merely  have  the  same  covariance  properties,  the  only  requirement  in  the 
earlier  realization  theory  [34-38],  but  are  equal  for  each  t  a.s.)  It  can 
be  seen  that,  slightly  extended,  the  class  S  has  a  lattice  structure  with  a 
smallest  (S*)  and  a  largest  (S*)  element,  the  partial  ordering  being  in¬ 
duced  by  the  "size"  of  the  covariance  matrix  .  P(t)  in  the  sense  that 
if  P^  -  P-,  is  positive  definite.  This  approach  will  enable  us  to  obtain 
stochastic  interpretations  of  many  important  smoothing  formulas  and  lay  the 
groundwork  for  a  theory  of  smoothing  which  so  far  has  been  lacking. 

Our  interest  in  the  smoothing  problem  was  caused  by  the  Mayne-Fraser  two- 
filter  formula  [5,  6],  on  which  topic  a  large  number  of  papers  have  been  writ¬ 
ten  [7-9,  12-17].  In  some  of  these  papers  the  authors  have  encountered  dif¬ 
ficulties  in  motivating  this  formula,  and  the  many  attempts  to  justify  it 
stochastically  have,  in  our  opinion,  been  less  than  convincing1.  In  our  sto¬ 
chastic  realization  setting  the  two  filters  have  a  natural  interpretation: 
they  are  simply  the  minimum  and  maximum  variance  realizations  S*  and  S* 
respectively.  Hence  the  latter  is  not  a  "backward  filter"  as  suggested  in  the 
literature  (although  it  can  be  reformulated  as  such),  but  a  "forward  filter" 
just  as  its  structure  suggests. 
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At  first  sight  some  of  the  technical  assumptions  above  may  seem  rather 
stringent,  namely  the  minimality  condition  and  the  analyticity  of  the  coeffi¬ 
cient  matrices.  These  conditions  are  introduced  to  insure  that,  for  each 
S  e  S,  the  state  covariance  matrix  P(t)  is  invertible  for  each  t  e  (0,  T) . 
It  is  quite  probable  that  these  assumptions  can  be  relaxed?  but  our  object 
here  is  to  convey  some  basic  ideas,  and  we  do  not  want  to  obscure  matters  by 
introducing  extra  difficulties  of  a  purely  technical  nature.  On  the  other 
hand,  the  model  (1.1)  is  more  general  than  the  one  usually  encountered  in 
the  smoothing  literature  in  that  Bdw  and  Ddw  may  be  correlated.  There  is 
a  reason  for  this  too.  To  limit  our  analysis  to  models  S  for  which  BD '  =  0 
would  render  the  class  S  incomplete. 

The  contents  of  the  paper  are  as  follows.  Section  2  is  devoted  to  some 
preliminary  results.  We  present  a  strict  sense  version  of  some  results  on 
backward  Markovian  representations  developed,  for  much  the  same  purposes,  in 
[15,  16].  The  idea  of  proof  is  borrowed  from  [20].  In  Section  3  we  define 
the  stochastic  realization  setting  mentioned  above,  and  in  Section  4  we  apply 
it  to  derive  and  interpret  various  smoothing  procedures. 

2.  PRELIMINARIES 

Let  H  be  the  space  of  all  centered  stochastic  variables  (on  an  under¬ 
lying  probability  space)  with  finite  second-order  moments.  Then  H  is  a 
Hilbert  space  with  inner  product  (£,  n)  =  E(Co).  For  an  arbitrary  k-dimen- 
sional  stochastic  process  (z(t);  0  $  t  $  T)  with  components  in  H,  define 
Ht(z)  to  be  the  (closed)  subspace  spanned  by  the  random  variables 
(Zj(t),  z2(t),  ...,  z^(t)},  and  let  H(z)  be  the  closed  linear  hull  in  H 
of  the  subspaces  {Ht(z);  0  s  t  s  T) ;  we  shall  write  this  as  H(z)  = 

V  t€  [o  t]  Similarly  define  the  past  space  H~(z)  :=  VTej0  HT(z) 
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and  the  future  space  H*(z)  :=  VT£jt  ^  Ht(z).  Sometimes  we  shall  be  more 
interested  in  spaces  spanned  by  the  increments  of  z.  Hence  we  define  H(dz), 
H^(dz)  and  H*(dz)  to  be  the  closed  linear  hulls  in  H  of  (z(t)  -  z(c); 

T,  a  e  1}  where  I  is  the  interval  [0,  T],  [0,  t]  and  [t,  T]  respectively. 

A 

For  each  n  e  H  and  subspace  K  c  H  let  E{ri|K}  be  the  projection  of  n 
onto  K,  i.e.,  the  wide  sense  conditional  mean  [1].  Let  u  be  a  stochastic 
vector  with  components  in  H,  and  let  H(u)  be  the  closed  linear  span  in  H 

A 

of  the  components  of  u.  Then,  for  any  n  e  H,  we  shall  often  write  E{p|u} 

A  . 

in  place  of  E{n|H(u)},  and,  for  any  subspace  K  c  H,  E{u|K}  will  denote 
the  vector  with  components  E{u^|K}.  We  shall  need  the  following  lemma,  the 
proof  of  which  can  be  found  in  most  standard  texts  on  estimation  theory. 

LEMMA  2.1.  Let  u  and  v  be  two  stochastic  vectors  with  components  in 
H  and  assume  that  E{w'}  is  positive  definite.  Then 

E{u|v}  =  E{uv'}(E{w'})_Jv.  (2.1) 

The  state  process  x  defined  by  (1.1a)  is  a  wide  sense  Markov  process 
[1],  i.e., 

E{x(t) |  H* (x) }  =  E{x(t) | x(s) }  for  t  *  s.  (2.2) 

To  see  this,  merely  note  that  x(t)  can  be  written 

ft 

x(t)  =  <J>(t,  s)x(s)  +  $(t,  T)B(T)dw  (2.3) 

■*  s 

and  that  H*(dw)  i  Hq(x)  e  H~(dw)  =  Hs(x).  (The  symbol  i  denotes  "orthogo¬ 
nal  to.")  Here,  of  course,  $  is  the  transition  matrix  defined  by 

(t,  s)  *  A(t)$(t,  s);  4>(s,  s)  *  I  (2.4) 
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In  deriving  the  main  results  of  this  paper  we  shall  need  to  reverse  the  di¬ 
rection  of  time  in  (1.1).  The  Markov  property  is  independent  of  the  choice  of 
time  direction  and  therefore  we  also  have 

E{x(s) |H*(x)}  =  i{x(s)|x(t)}  for  t  2  s.  (2.5) 

(In  the  present  setting  this  can  be  seen  by  observing  that,  in  view  of  (2.3), 
H*(x)  eHJx)  cH*(dw)  1  x(s).)  The  differential  equation  (1.1a),  however,  is 
not  symmetric  with  respect  to  time;  the  two  terms  in  the  right  member  of  (2.3) 
are  orthogonal  if  and  only  if  t  2  s.  Hence  we  need  to  define  a  backward 
version  of  (1.1a).  This  requires  the  inversion  of  the  covariance  matrix 
P(t),  which  is  the  topic  of  the  following  lemma.  Here  and  in  the  sequel 
Q  >  0  (Q  2  0)  means  that  the  symmetric  matrix  Q  is  positive  (nonnegative) 
definite. 

LEMMA  2.2.  Let  P  be  the  state  covariance  function  of  the  linear  stochas¬ 
tic  system  S  defined  in  §1.  Then,  for  any  e  >  0,  P  1  exists  and  is  ana¬ 
lytic  on  the  interval  [e,  T] .  If  II  >  0,  the  same  holds  for  the  complete 
interval  [0,  T] . 

Proof.  Integrating  (1.2)  yields 

P(t)  =  $(t,  0)IW(t,  0)'  ♦  [  *(t,  t)B(t)B(t)  '4>(t,  T)'dT  (2.6) 

Jo 

which  is  positive  definite  if  II  >  0;  hence,  since  A  and  B  are  analytic 
on  [0,  T] ,  so  is  P  *.  Now  assume  that  II  }  0.  Since  S  is  minimal,  (A,  B) 
must  be  completely  controllable.  In  fact,  were  this  not  the  case,  the  input - 
output  map  of  S  could  be  reduced  [39],  contradicting  minimality.  Since  in 
addition  A  and  B  are  analytic,  (A,  B)  is  totally  controllable  [40,  41]. 
Therefore  since  the  second  term  in  (2.6)  is  the  controllability  gramian, 

P(t)  >  0  on  any  interval  [c,  T] .  The  analyticity  of  P  *  then  follows  in 
the  same  way  as  above.  □ 


rep- 


As  we  shall  see  below,  it  is  more  convenient  to  express  the  backward 
resentation  in  terms  of  the  process 

x(0  =  P(t)  _1x(t)  (-  •") 

rather  than  x.  In  view  of  Lemma  2.2,  x(t)  is  well  defined  with  components  in 
H  on  any  interval  [e,T].  Let  P  denote  its  covariance  function,  i.e., 

P(t)  =  E{x (t) x (t)  '}.  (2.8) 

We  are  now  in  a  position  to  formulate  a  backward  version  of  the  state  equa¬ 
tion  (1.1a). 

LEMMA  2.3.  Let  x  be  the  state  process  of  the  linear  stochastic  system 
S.  Then,  for  any  e  >  0,  the  process  x  defined  by  (2.7)  satisfies  the  bach- 
ward  model 

dx  =  -A(t) 'x(t)dt  +  B(t)dw;  x(T)  =  £  (2.9) 

on  [e,  T],  where  \  -  P(T)~^x(T),  §  =  P  and  w  is  a  p-dimensional 
orthogonal  increment  process  satisfying  (1.2)  and  the  condition 
Ht(dw)  i  H*(x)  for  all  t.  The  increments  of  w  acre  given  by 

dw  =  dw  -  B(t)  ,P(t)'1x(t)dt,  (2.10) 

and  the  covariance  function  (2.8)  by  P  =  P  * ;  it  satisfies  the  Liapunov 
equation 

p  =  -a'P  -  PA  -  BB ' ;  P (T)  =  H,  (2.11) 

where  T?  =  P(T)  *.  If  II  >  0,  equations  (2. 9) -(2. 11)  are  defined  on  the  whole 

interval  [0,  T] . 

Lemma  2.3  is  a  strict  sense  version  of  a  similar  result  presented  in 
[15,  16].  As  explained  in  [42],  an  alternative  justification  of  the  wide 
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sense  results  [15,  16]  can  be  obtained  by  means  of  the  earlier  work  [12,  13]. 
The  version  given  in  all  these  papers  is  however  insufficient  for  our  pur¬ 
poses  since  it  provides  a  deterministic  rather  than  a  probabilistic  result. 
Moreover,  we  have  chosen  to  write  the  backward  equation  in  terms  of  x  rather 
than  x  as  in  [15,  16].  (However,  see  the  "adjoint"  formulation  in  [16].) 

The  reason  for  this  will  become  evident  in  Section  3.  Our  choice  will  yield 
a  backward  Kalman-Bucy  filter  which  is  invariant  over  the  class  5,  the  one 
in  [15,  16]  will  not. 

The  proof  of  Lemma  2.3  follows  exactly  the  same  lines  as  in  [20],  It  is 
based  on  the  observation  that,  for  s  ^  t,  the  orthogonal  decomposition 

x (s )  =  E{x(s) | H*(x) }  +  [x(s)  -  E{x(s) |H*(x)}]  (2.12) 

can  be  written  in  the  form 

x ( s)  =  $(t,  s) 'x(t)  *  f  <J>(t,  s)  r B (t) dw  (2.13) 

■'t 

which  is  the  integral  form  of  (2.9). 

Proof  of  Lenma  2.3.  In  view  of  Lemma  2.2,  the  state  covariance  function 

P  is  invertible  on  the  stated  interval.  Clearly  P  =  P  Then,  since 
2 

P  =  -PPP,  (2.11)  follows  from  (1.3).  Then  Lemma  2.1  together  with  (2.5)  and 
(2.7)  yields 

E{x(s) | H*(x) }  =  $(t,  s)'x(t)  (2.14) 

for  it  follows  from  (2.3)  that  E{x(s)x(t)  '}  =  P(s)4»(t,  s) '  for  s  <  t.  Con¬ 
sequently,  the  process  u(t)  :=  4>(t,  0)  'x(t)  is  a  wide  sense  backuard  mar¬ 
tingale  with  respect  to  H^(x),  i.e., 

E{u(s) |H*(x)}  =  u(t)  for  s  s  t,  (2.15) 


Mk 
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and  hence  it  has  orthogonal  increments.  We  shall  now  show  that  u  can  be 
normalized  as  follows: 


u ( s)  -  u(t) 


•s 

*(T,  0)'§(T)dw, 
t 


(2.16) 


where  w  is  defined  by  (2.10).  To  this  end  differentiate  u(t)  = 

4>(t,  0) 'P(t)x(t)  and  use  (1.1a),  (2.4)  and  (2.11)  to  obtain  du  = 

$(t,  0) 'B(dw  -  B'Px  dt) .  It  remains  to  show  that  w  is  an  orthogonal  incre¬ 
ment  process  satisfying  (1.2).  This  follows  from  a  tedious  but  straight-for¬ 
ward  calculation  of  the  incremental  covariance  function.  (If  B  were  full 
rank,  we  could  conclude  this  directly  from  the  martingale  property  (2.15); 
this  could  be  achieved  by  working  with  the  complete  system  S  instead.) 

The  desired  representation  is  then  obtained  by  noting  that 


x(s)  =  4>(0,  s)'[u(t)  +  u(s)  -  u  (t)  ] , 


into  which  we  insert  (2.16)  to  obtain  (2.13).  Obviously  H*(x)  j.  Ht(dw),  for, 
by  construction,  the  two  terms  in  (2.13)  are  orthogonal  for  all  t.  □ 


3.  FORWARD  AND  BACKWARD  STOCHASTIC  REALIZATIONS 

Let  { y (t) ;  0  <  t  <  T}  be  an  m-dimensional  vector  process  defined  as  the 
output  of  the  linear  stochastic  system  S  introduced  in  Section  1.  Any  sys¬ 
tem  of  type  (1.1)  [with  £  e  H,  w  satisfying  (1.2)  and  £  x  H(dw)]  having  the 
given  process  y  as  its  output  is  called  a  realization  of  y.  In  particular, 
by  assumption,  S  is  minimal,  i.e.,  there  is  no  other  realization  of  y  with 

3 

a  state  process  of  smaller  dimension,  and  analytic,  i.e.,  its  parameter  ma¬ 
trices  A,  B,  C,  D  and  R  *  are  analytic  on  [0,  T] .  Clearly  the  components 
of  x(t)  and  y(t)  belong  to  H  for  all  t  «  [0,  T] ,  and  the  same  holds  for 


the  increments  of  w. 
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It  is  well -known  that  the  least-squares  estimate 

x.(t)  =  E{x(t)|H‘(dy)}  (3.1) 

of  the  state  process  x  of  S  is  generated  on  [0,  T]  by  the  Kalman-Bucy 
fi l ter 

dx*  =  Ax*dt  +  B*R  ^“(dy  -  Cx*dt) ;  x*(0)  =  0  (3.2a) 

where  R  (t)  is  the  symmetric  square  root  of  R(t)  =  D(t)D(t)',  and  the  gain 
function  B*  is  given  by 

B*  =  (Q*C ’  +  BD')R'1/2,  (3.2b) 

the  error  covariance  matrix 


Q* (t)  =  E{[x(t)  -  x*(t)][x(t)  -  x*(t)]'} 


(3.2c) 


being  the  solution  of  the  matrix  Riccati  equation 


Q*  =  AO*  +  Q*A '  -  (Q*C '  +  BD')R_1(Q*C'  +  BD')'+  BB  ' 

.Q*  (o)  =  n- 


(3. 2d) 


As  we  shall  see  shortly  there  are  other  realizations  which  have  the 
same  Kalman-Bucy  filter  (3.2a).  Hence  we  define  S  to  be  the  class  of  all 
analytic  realizations  S  of  v  whose  Kalman-Bucy  filter,  determined  as  in  (5.2), 
has  the  same  coefficient  functions  A,  C,  R  and  B*  as  in  (5.2a).  Then  (since 
we  only  consider  proper  [20]  realizations)  the  estimates  x*  are  also  the  same. 
(Tne  error  covariance  Q* ,  however,  will  of  course  vary  over  5.)  Clearly  all 
realizations  in  5  are  minimal.  Moreover,  it  is  well-known  that  the  innovation 
prooers  (w*(t);  0  <  t  s  T} ,  whose  increments  are  defined  by 

dw*  =  R-1/2(dy  -  Cx*dt)  ,  (3.5) 


is  a  process  with  orthogonal  increments  satisfying  (1.2)  and  H^tdw*)  = 
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H~(dy)  for  all  t  e  [0,  T]  (see  e.g.  [43]).  Then  (3.2a)  and  (3.3)  yield 

dx„  =  Ax„dt  +  B„dw„  ;  x„(0)  =  0 

(SJ  •  (3.4) 

dy  =  Cx*dt  +  R^2dw*. 

Since  B*  is  analytic  on  [0,  T] ,  this  is  a  realization  in  S  whose  state 
covariance  matrix  P*(t)  :=  E{x„ (t)x„ (t)  '}  satisfies 

P*  =  AP*  +  P*A‘  +  B*B;  ;  P*(0)  =  0.  (3.5) 

(This  can  also  be  seen  by  subtracting  (1.3)  from  (3. 2d),  noting  that  Q*  = 

P  -  P#.)  Now  define  the  n  x  m  matrix  function 

G  =  P*C'  +  B*R1/2,  (3.6) 

which  is  clearly  analytic  on  [0,  T] . 

LEMMA  3.1.  Let  G  be  defined  by  (3.6).  Then  for  any  realization  S  e  S, 
P(t)C(t)'  +  B(t)D(t)'  »  G(t)  (3.7) 

for  all  t  e  [0,  T] . 

Proof.  This  follows  from  (3.2b)  and  the  fact  that  Q*  =  P  -  P* •  □ 

Consequently  A,  C,  G  and  R  are  invariants  for  the  class  S  --in  fact, 
the  covariance  function  of  y  is  determined  by  these  four  matrix  functions 
[37,  44] --whereas  B,  D,  P,  w  and  x  will  vary  with  different  realizations 
S.  Actually  even  the  dimension  p  of  the  process  w  will  vary.  However, 
since  R  is  full  rank,  we  will  always  have  p  z  m. 

The  Kalman-Bucy  filter  realization  S*  belongs  to  a  class  of  realiza¬ 
tions  for  which  p  is  minimal,  i.e.,  p  =  m.  Define  Sq  to  be  the  subclass 
of  all  S  e  S  such  that  p  =  m  and  x(0)€  H(dy) .  (Note  that,  since  y(0)  =  0, 


H(dy)  =  H(y) •  We  shall  use  the  former  notation  as  we  are  really  only  in¬ 
terested  in  the  increments  of  y,  the  assumption  y(0)  =  0  being  one  of 
convenience.)  Let 
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dx  =  Ax  dt  +  B  dw  ;  x  (0)  =  £ 
o  o  oooo 


dy  =  Cx  dt  +  D  dw 
^  o  oo 


(3.8) 


be  a  realization  in  S  with  state  covariance  function  P  .  Then  D  is 

o  oo 

invertible  and  therefore 

dx  =  Ax  dt  +  B  D  * (dy  -  Cx  dt) ;  x  (0)  =  £  .  (3.9a) 

o  o  oow  o'  o  o 


Now  let  (1.1)  be  an  arbitrary  realization  in  S  and  define 


0  =  P  -  P  .  (3.9b) 

'0  0 

« 

Then  Lemma  3.1  yields 

Bq  =  ( QqC  '  *  BD'HD^r1  (3.9c) 

where  Qq  satisfies  the  matrix  Riccati  equation 

qo  =  AQ0  +  V'  "  ^o^  +  +  BD')'  +  BB' 

(3 . 9d) 

*  V°>  -  n  -  no 

To  see  this  just  insert  (3.9c)  into  the  equation  (1.3)  corresponding  to  Pq 
and  subtract  from  (1.3).  Formally  (3.9)  looks  precisely  like  the  Kalman-Bucy 
filtering  equations  (3.2).  In  fact,  the  differential  equations  are  the  same, 
only  the  initial  conditions  differ.  However,  note  that,  unlike  Q*»  Qq  is 
in  general  indefinite  due  to  the  definition  of  Qq(0).  In  view  of  the  fact 
that  £q  e  H(dy),  (3.9a)  implies  that  H(xq)  c  H(dy) .  We  shall  call  a  reali- 
zation  S  e  S  satisfying  the  condition  H(x)  c  H(dy)  internal ;  if 
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H(x)  $  H(dy)  we  shall  say  that  S  is  external  [20].  Hence  we  have  shown  that 

'g' 

all  S  e  S  are  internal.  In  Section  4  we  shall  see  that,  if  is  full 

o  (UJ 

rank,  Sq  is  precisely  the  class  of  all  internal  realizations.  We  shall  also 
see  that  Sq  is  a  partially  ordered  set  with  a  smallest  element  and  that  it 
can  be  slightly  extended  to  also  contain  a  largest  element. 

Our  next  task  is  to  establish  a  backward  counterpart  S  to  each  realiza¬ 
tion  S  e  S.  We  shall  begin  by  restricting  our  attention  to  the  subclass  S+ 
of  all  realizations  S  e  S  for  which  II  >  0. 

LEMMA  3.2.  The  class  S+  is  nonempty. 

Proof.  It  is  shown  in  [44]  that,  since  y  is  generated  by  the  model  (1.1), 
for  some  e  >  0  the  covariance  function  of  y  can  be  continuously  extended  to 
the  interval  [0,  T  +  e]  while  retaining  its  nonnegativity  property  and  its 
"lumped"  character.  It  is  not  hard  to  modify  the  proof  of  [44,  Appendix  II] 
to  show  that  a  similar  extension,  which  also  preserves  analyticity,  can  be  made 
to  the  interval  [-£,  T]  for  some  e  >  0.  Hence,  by  the  main  result  of  [44], 
there  is  an  (analytic)  realization  S£  of  y  on  [-£,  T]  with  state-dimen¬ 
sion  n.  Since  its  restriction  to  [0,  T]  belongs  to  S,  it  is  minimal. 
Therefore  (A,  B)  corresponding  to  S£  is  totally  controllable  [40],  and 
consequently  P(0)  >  0  by  the  argument  of  Lemma  2.2.  Hence  the  restriction 
of  S£  to  [0,  T]  belongs  to  S+.  □ 

Let  S  e  S+.  Then,  by  Lemma  2.3,  x  =  P  *x  is  defined  on  all  of  [0,  T] 
and  satisfies  (2.9)  there.  Inserting  (2.10)  into  (1.1b)  yields  dy  = 

(CP  +  DB')xdt  +  Ddw,  so  in  view  of  Lemma  3.1  we  have  obtained  a  backward  model 
for  y  on  [0,  T] ,  namely 

[  dx  =  -A'xdt  +  Bdw  ;  x(T)  =  \ 

(S) 


dy  =  G  'xdt  ♦  Ddw 


(3.10) 
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where  £  =  P(T)  ^xfT)  J.  H(dw)  and  B  =  P  ^B.  Its  state  covariance  function 
P  =  P  1  satisfies  (2.11).  We  shall  call  any  model  of  type  (3.10)  with  y 
as  its  output,  £  e  H,  w  satisfying  (1.2)  and  £  l  H(dw)  a  backward  realization 
of  y.  In  view  of  Lemma  2.2,  S  is  also  analytic  (i.e..  A,  B,  G,  D  and  R  * 
are  analytic).  Note  that  S  and  S  have  the  same  state  space }  i.e.. 


Ht(x)  =  Ht(x), 


for  each  t  e  [0,  T] . 

By  symmetry  with  the  forward  setting  we  can  now  see  that 
x*(t)  =  E{x(t) |H*(dy)} 


(3.11) 


(3.12) 


is  generated  by  the  backward  Kalman-Bucy  filter 

dx*  =  -A'x*dt  +-B*R_1/2(dy  -  G'x*dt)  ;  x*(T)  =  0,  (3.13a) 

-  -  -  -1/2 

where  B*  =  -(Q*G  -  BD')R  ,  and  the  error  covariance  Q*(t)  := 

E{[x(t)  -  x„(t)][x(t)  -  x#(t)]'}  satisfies 


Q*  =  -A'Q*  -  Q*A  +  (Q*G  -  BD')R_1(Q*G  -  BD')'  -  BB' 

* 

,Q*(T)  =  P(T)'1, 


(3.13b) 


and  that  the  backward  innovation  process  (w#(t);  0  <  t  <  T},  given  by 

dw*  =  R_1/2(dy  -  G'x*dt)  (3.14) 

has  orthogonal  increments  and  satisfies  (1.2)  and  H*(dw*)  =  H*(dy)  for  all 
t  e  [0,  T] .  (see  [20,  45].)  Hence  the  covariance  function  P*(t)  := 
E{x„(t)x*(t) '}  satisfies 

K  -  -A'P*  -  p*A  -  M:  ;  p*m  =  0.  (3.15) 


The  following  lemma  ensures  the  invariance  of  the  backward  filter  (3.13a). 
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LEMMA  3.3.  The  gain  function  B+  is  uniquely  determined  hu  the  four 

(invariant)  matrix  functions  A,  C,  G,  and  R,  i.e.,  B#  is  invariant  for  S. 

Proof.  Since  Q*  =  P  -  P*,  it  follows  from  Lemma  3.1  that  B„  = 

-1/2 

(C '  -  P*G)R  ,  which  inserted  into  (3.15)  yields  an  equation  for  P„  which 
only  depends  on  A,  C,  G  and  R.  Hence  the  same  holds  for  B#.  0 


Now  define  5  to  be  the  class  of  all  analytic  backward  realizations  S 
having  (3.13a)  as  its  backward  Kalman-Bucy  filter,  and  let  S+  be  the  sub¬ 
class  consisting  of  those  S  e  5  for  which  II  >  0.  In  the  same  way  as  in  the 
forward  setting  it  is  seen  that  the  realization 


(SJ 


dx#  =  -A'x^dt  +  B^dw^;  x#(T)  =  0 


[  dy  =  G'x*dt  +  R^^dw, 


(3.16) 


belongs  to  S.  The  state  covariance  function  P*  of  S„  is  given  by  (3.15). 
By  Lemma  3.3  the  class  5  is  uniquely  defined  in  terms  of  the  invariants  A, 
C,  G  and  R,  and  therefore  the  backward  counterpart  S  of  any  S  e  S+  be¬ 
longs  to  S.  In  particular,  since  P(T)  is  finite  and  positive  definite, 

S  e  5+.  Obviously  there  is  a  complete  symmetry  between  the  forward  and  the 
backward  settings;  all  results  of  this  section  have  backward  versions  obtained 
by  merely  starting  from  a  minimal  backward  realization  instead.  Consequently 
all  realizations  in  5  are  minimal.  (Indeed,  were  this  not  the  case,  we 
could,  by  reversing  the  procedure  above,  construct  a  forward  realization  of 
dimension  less  than  n,  contradicting  our  original  minimality  assumption.) 
Moreover,  it  is  not  hard  to  see  that  the  two  subclasses  S  and  5  are  in 
one-to-one  correspondence. 

In  order  to  extend  the  one-one  correspondence  between  forward  and  back¬ 
ward  realizations  beyond  and  S+  we  shall  have  to  enlarge  the  classes 

S  and  5  slightly  in  the  following  way.  Let  S  be  the  class  of  all  ana- 
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lytic  realizations  (1.1)  of  y  defined  on  [0,  T  -  e]  for  any  e  >  0  and 

A 

having  (3.2a)  for  its  Kalman-Bucy  filter  on  each  such  interval,  and  let 

C:  - 

be  the  corresponding  extension  of  S  •  The  classes  S  and  S  are  defined 

o  o 

analogously  with  respect  to  (3.13a)  and  all  intervals  [e,  T] .  We  shall  call 

A  ^  A 

the  elements  of  S  and  S  generalized  realizations.  Clearly  5  c  S  and 

A 

5  c  5.  Then  the  forward-backward  construction  above  can  be  redone  in  the  light 
of  Lemma  2.3  to  yield  the  following  theorem,  which  also  summarizes  some  of  the 
pertinent  facts  on  this  topic. 

THEOREM  3.1.  To  each  realization  (1.1)  in  S  there  corresponds  a  gen- 
eralized  backward  realization  (3.10)  in  5  such  that  P  =  P  ,  B  =  P  B, 
x  =  P  *x  and  dw  =  dw  -  B'P  *xdt.  Likewise  to  each  backward  realization 
(3.10)  in  S  there  is  a  generalized  realization  (1.1)  in  5  such  that  P  = 

*  P  B  =  P  *§,  x  =  P  *x  and  dw  =  dw  +  B'P  1xdt.  For  each  such  pair 
(S,  S)  of  forward  and  backward  (generalized)  realizations ,  relation  (3.11) 
holds  for  each  t  for  which  both  S  and  S  are  defined. 

Since  P*(T)  =  0,  the  backward  filter  realization  S*  has  a  forward 
counterpart  only  in  this  generalized  sense,  and  it  has  the  form 

dx*  =  Ax*dt  +  B*dw*  ;  x*(0)  =  P*(0)'1x^(0) 

(S*)  •  (3.17) 

dy  =  Cx*dt  +  R^^dw* 

with  state  covariance  function  P*  =  P„*  satisfying 

P*  =  AP*  +  P*A'  +  B*B* '  ;  P* (0)  =  P.(O)'1  (3.18) 

on  [0,  T) .  Obviously  P*(t)  ■*■<*>  as  t  -*■  T.  [Note  that  B*B* '  is  not  inte¬ 
grate  on  (0,®).]  The  following  lemma  explains  the  "svper  star"  notation. 
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LEMMA  3.4.  Lei  r  be  the  state  covariance  function  of  a  realization 


S  e  S.  Then 


P„(t)  ^  P(t)  <  P*(t)  (3.19) 

for  all  t  *  [0,  T) . 

Proof.  Since  Q*(t)  is  a  covariance  matrix,  Q*(t)  >  0.  But  Q*  = 

P  -  P*»  and  therefore  P(t)  i  P*(t).  An  analogous  argument  in  the  backward 
setting  yields  P(t)  >  P„(t),  i.e.,  P(t)'1  >  P*(t)‘1,  from  which  P(t)  <  P*(t) 
follows.  □ 

Relation  (5.19)  induces  a  partial  ordering  of  S,  S*  being  the  smallest 

and  S*  the  largest  element;  the  same  holds  for  S  ,  for  both  S*  and  S* 

o 

belong  to  this  subclass.  (It  can  be  shown  that  S  and  S  have  lattice  struc- 

o 

tures,  but  this  goes  beyond  the  scope  of  this  paper.)  Since  S*  e  Sq,  x*  satis¬ 
fies  a  Kalman-Bucy  type  equation 

dx*  =  Ax*dt  ♦  B*R'1/2(dy  -  Cx*dt)  ;  x*(0)  =  5*.  (3.20a) 

where  =  P* (0) _1x, (0) ,  and  B*  can  be  determined  from  any  other  realiza¬ 

tion  S  c  S  through  equations  (3.9c,  d) ,  setting  B*  =  Bq  and  Q0  =  II  -  II*. 
The  corresponding  solution  of  the  matrix  Riccati  equation  (3.9d)  is,  in 

view  of  (3.9b),  Qq  =  P  -  P*.  which  is  nonpositive  definite  (Lemma  3.4).  For 
the  smoothing  problem  it  will  be  more  convenient  to  express  B*  in  terms  of 
a  nonnegative  definite  solution  of  (3.9d)  instead,  and  therefore  we  define 
Q*  :=  -Qo,  i.e., 

q*  =  p*  -  r,  (3.20b) 

in  terms  of  which  (3.9c,d)  yields 

-1/2 


B*  =  - (Q*C  '  -  BP ') R 


(5.20c) 


with  Q*  satisfying  the  matrix  Riccati  equation 


AQ*  +  Q*A'  +  (Q*C ' 


BD  f) R_1 (Q*C  ' 


-  BD ')  '  -  BB' 


(3 . 20d) 


(.Q* (0)  =  n*  -  n, 

where  II*  =  P*(0)  * .  Clearly  Q*(t)->-  °°  as  t  -*•  T.  The  definition  (3.20b) 
enables  us  to  interpret  Q*  as  an  error  covariance  function,  much  in  analogy 
with  the  Kalman-Bucy  filter.  In  fact, 


Q*(t)  =  E{[x(t)  -  x* (t) ] [x (t)  -  x*(t)]'}  (3.21) 

for  all  t  e  [0,  T) .  This  is  an  immediate  consequence  of  the  lemma,  which  we 
shall  also  need  in  §4. 


LEMMA  3.5.  Let  x  be  the  state  process  and  P  the  state  covariance 
function  of  any  realization  in  S.  Then 

E{x(t)x*(t) '}  =  P*(t),  E{x(t)x*(t)  '}  =  P(t)  (3.22) 

and 

E([x(t)  -  x,(t)][x*(t)  -  x(t)]  '}  =  0.  (3.23) 

Proof.  In  view  of  the  definition  (3.1),  Ht (x  -  x„)  i  H^fx*)  and  there¬ 
fore  the  first  of  relations  (3.22)  follows.  The  analogous  relation  in  the 
backward  sitting  reads  E{x(t)x#(t)  '}  =  P*.  Hence  E{x(t)x*(t) '}  = 

PE{x(t)x# (t) '}P*  =  P,  for  x  =  P‘1x  and  P*  =  P** .  Then  (3.23)  is  an  imme¬ 
diate  consequence  of  (3.22).  □ 

In  §4  we  shall  need  to  invert  both  Q*(t)  and  Q*(t)  for  arbitrary 
t  c  [0,  T) .  This  is  possible  for  all  realizations  S  «S  such  that 
p*(t)  <  P(t)  <  P*(t)  for  all  t  on  this  interval.  We  shall  call  the  class 
of  all  such  S  the  interior  of  S  and  denote  it  int  S. 


]') 

LirrA  3.C.  The  interior  of  $  'r  nonempty. 

Proof.  Let  Q*  he  the  error  covariance  (3.2c)  corresponding  to  realisa¬ 
tion  S  t  5  .  Then  Q„(0)  =  H  >  0.  A  simple  reformulation  of  (5. 2d)  yields 

Q.  =  r*Q,  +  Q*r;  +  (b.r‘1/2 d -b) (b.r‘1/2d -b) -  (3.24) 

where  T*  is  the  feedback  matrix 

r*  =  A  -  B*R'1/2C  (3.25) 

of  the  Kalman-Rucv  filter  (3.2).  The  Liapunov  type  equation  (3.24)  can  be 
integrated  to  yield  an  expression  of  the  same  general  form  as  (2.6).  From 
this  it  is  seen  that  Q*(0)  >  0  implies  that  Q, (t)  >  0  for  all  t  e  [0,  T]  . 

It  remains  to  show  that  Q*(t)  >  0  for  all  t  e  [0,  T) .  To  this  end  first 
note  that  the  corresponding  backward  realization  S  belongs  to  this  is 

clear  from  the  discussion  leading  to  Theorem  3.1.  Then  we  can  repeat  the  ar¬ 
gument  above  to  see  that  Q*(t)  >  0  for  all  t  c  [0,  T] .  But  Q*  =  P(P-P,)P*  = 
PQ*P*.  Since  r  >  0  and  P*  >  0  on  [0,  T) ,  Q*(t)  >  0  *for  all  t  e  [0,  T).  □ 

COROLLARY  3.6.1.  Let  Q  =  P*  -  P*.  Then  Q(t)  >  0  for  all  t  e  I0,T). 

COROLLARY  3.6.2.  Let  S  e  S+.  Then  Q*(t)  >  0  for  all  t  e  [0,T] . 

We  shall  now  demonstrate  that  the  two  processes  x*  and  x*  together 
contain  all  the  relevant  information  on  y  needed  in  estimating  the  state 
process  x  of  an  arbitrary  realization  S  e  5.  To  this  end  first  note  that 
(3.1)  can  be  written 

E{Ht  (x)  |H~ (dv) }  =  Ht(xJ,  (3.26) 

and  that  (3.11)  and  (5.12)  yield 

i{Ht(x) |H*(dy)}  =  Ht(x*)  (3.27) 

for  all  t  t  [0,  T).  Now  define  the  orthogonal  complements  N't  :=  H^(dy)  e 
Ht(>.)  and  :=  l!*(dy)  e  H^(x*)  respectively.  Then  we  obtain  the  orthogo¬ 


nal  decomposition 
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H(dy)  =  N'  ®  H£  ®  N*  (3.28) 

where  H°  is  the  frame  space 

H“  =  Ht(xJ  v  Ht(x*)  (3.29) 

(where  A  v  b  denotes  the  closed  linear  hull  in  H  of  A  and  B.)  Cf.  [22, 
24,  26]. 

LEMMA  3.7.  (cf.  [27])  Let  x  be  the  state  process  of  a  realization  in 
S.  Then,  for  t  e  [0,  T) , 

Ht(x)  c  H“  e  [Htdy)]1 

where  [H(dy)]1  is  the  orthogonal  complement  of  H(dy)  in  H. 

Proof.  Clearly  H^(x)  x  N  .  To  see  this  note  that  the  components  of 
x(t)  -  x*(t)  are  orthogonal  to  Ht(dy)  3  N  and  that  the  components  of  x*(t) 
belong  to  H^x*)  i  N^.  In  the  same  way  we  show  that  H^(x)  x  N*.  D 

4.  THE  SMOOTHING  PROBLEM 

Consider  an  arbitrary  realization  (1.1)  in  the  class  S.  The  basic  problem 
before  us  is  to  determine  the  smoothing  estimate 

x(t)  =  E{x(t)|H(dy)}  (4.1) 

for  each  t  e  [0,  T)  and  to  interpret  it  in  terms  of  stochastic  realizations. 
Let  E  denote  the  corresponding  estimation  error  covariance,  i.e., 

I(t)  =  E{[x(t)  -  x(t)][x(t)  -  x(t)] (4.2) 

Of  course  this  problem  is  interesting  only  if  the  realization  S  is  external. 
However,  by  not  restricting  our  analysis  to  external  realizations,  as  a  by¬ 
product  we  shall  obtain  some  interesting  results  on  internal  models  also. 


****** 
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In  view  of  Lemma  3.7,  x(t)  e  H°,  and  consequently  there  are  two  matrix 
functions  K*  and  K*  such  that 

x(t)  =  K*(t)x*(t)  +  K*(t)x*(t) .  (4.3) 

The  components  of  the  estimation  error  x(t)  -  x(t)  are  clearly  orthogonal 
to  H(dy)  and  hence  in  particular  to  the  components  of  x#(t)  and  x*(t). 
Therefore  E{x(t)x*(t) '}  =  E{x(t)x*(t) '}  and  E{x(t)x*(t)  '}  =  E{x(t)x*(t) . 
By  Lemma  3.5,  the  first  of  these  relations  yields  P*  =  K*P*  +  K*P„  and 
consequently 

K*(t)  +  K* (t)  =  I  (4.4) 

for  all  t  e  (0,  T) ,  because  P*(t)  is  nonsingular  on  this  interval.  The 
second  relation  yields 

P(t)  =  K,(t)P*(t)  +  K*(t)P*(t)  (4.5) 

for  all  t  c  [0,  T) .  Then  solving  (4.4)  and  (4.5)  for  K*  and  K*  we  ob¬ 
tain  K*  =  Q*Q  *  and  K*  =  Q*Q  *,  where  as  before  Q*  =  P  -  P*,  Q*  =  P*  -  P 

and  Q  =  P*  -  P*.  Note  that  Q(t)  is  nonsingular  for  all  t  e  [0,  T) 

(Corollary  3.6.1)  and  that 

Q(t)  =  Q,(t)  ♦  Q*(t) .  (4.6) 

THEOREM  4.1.  Let  x  be  tlv:  state  process  of  a  realization  (1.1)  of 
class  5.  Then  the  smoothing  estimate  (4.1)  is  given  by 

x(t)  =  [I  -  Q.(t)Q(t)'l]x,(t)  ♦  Q*(t)Q(t)_1x*(t)  (4.7) 

and  the  error  covariance  function  (4.2)  by 

£(t)  =  Q*(t)  -  Q*(t)Q(t)‘1Q#(t)  (4.8) 


for  all  t  c  [0,  T) . 
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Proof.  Relation  (4.7)  was  derived  above  for  t  £  (0,  T) ;  for  t  =  0  (4.7) 
follows  from  (4.19)  below.  To  prove  (4.8)  note  that 

x  -  x  =  (I  -  Q*Q-1)  (x  -  xj  +  Q*Q'1(x  -  x*).  (4.9) 

By  Lemma  3.5  the  two  terms  of  (4.9)  are  orthogonal  and  therefore,  observing 
(3.2c)  and  (3.21), 

I  =  (I  -  Q*Q_1)Q*(I  -  Q_1Q*)  +  Q*Q‘1Q*Q'1Q*. 
which,  in  view  of  (4.6),  yields  (4.8).  □ 

Relation  (4.5)  should  be  compared  with  the  decomposition  in  [46,  Theorem  6]. 

Note  however  that  K„(t)  and  K*(t)  are  projections  if  and  only  if  the  reali- 

2 

cation  S  is  internal.  To  see  this  observe  that  (K*)  =  K*,  i.e., 

=  Q*,  if  and  only  if  1=0  (Theorem  4.1).  , 

Theorem  4.1  is  a  generalization  of  results  given  in  [20-22].  Following 
the  procedure  in  [22]  we  obtain  an  alternative  derivation  by  observing  that 
x*(t)  and 

z(t)  =  x*(t)  -  x#(t)  (4.10) 

A 

are  orthogonal  (to  see  this,  note  that  x*(t)  =  E{x*(t) |Ht(dy)})  and  applying 
Lemma  2.1.  In  fact,  since  x(t)  =  i{x(t)|H°}  (Lemma  3.7)  and  H°  = 

HtUJ  ©  Ht(z), 

x(t)  =  E{x(t)|x*(t)}  +  E{x(t) | z(t) } .  (4.11) 

Then  using  Lemmas  2.1  and  3.5  and  the  fact  that 

Q(t)  =  E{z(t)z(t)'}  (4.12) 


we  obtain 


(4.13) 


x(t)  =  x*(t)  +  Q*(t)Q(t)'1z(t) , 

which  is  precisely  (4.7). 

If,  for  the  moment,  we  restrict  our  attention  to  realizations  in  the  in¬ 
terior  of  S  we  obtain  the  following  well-known  result. 

COROLLARY  4.1.  Let  S  e  int  Sj  let  x  be  the  state  process  of  S,  and 
let  x  be  the  corresponding  smoothing  estimate  (4.1).  Then }  for  each  te  [0,T) 

x(t)  =  Z(t)  [Q*(t)_1x*(t)  +  Q* (t) _1x*(t) ] ,  (4.14) 

where  x*  and  x*  are  given  by  (3.2)  and  (3.20)  respectively  and  the  smoothing 
error  covariance  I  by 

I(t)"1  =  Q*(t)-1  +  Q* (t) _1 .  (4.15) 

Proof.  Since  S  €  int  S,  Q#  and  Q*  are  invertible.  By  writing  (4.8) 
as  I  =  Q*Q  1 (Q  -  Q*)  and  using  (4.6),  it  is  seen  that 

l  =  Q.Q'V-  (4.16) 

Inverting  this  and  again  using  (4.6)  yields  (4.15).  From  (4.16)  we  also  see 
that  Q*Q_1  =  Z(Q*)-1 .  Then  I  -  Q^Q*1  =  Z[Z_1  -  (Q*)'1]  =  ZQ"1.  Hence 
(4.14)  follows  from  (4.7).  0 

Relations  (4.14)  and  (4.15)  together  with  (3.2)  and  .3.20)  is  the  Mayne- 
Fraser  two-filter  formula  [5,  6],  which  has  received  considerable  attention  in 
the  literature  [7-9,  13-17].  Although  this  algorithm  is  easy  to  derive  for¬ 
mally  [9],  its  probabilistic  justification  has  caused  considerable  difficulty, 
partly  due  to  the  fact  that  Q*(t)  -*•  06  as  t  -*■  T.  The  system  (3.20)  has 
usually  been  interpreted  as  a  backward  filter,  and  in  [14-17]  it  is  presented 
as  the  limit  of  such  a  filter  as  a  certain  covariance  matrix  function  tends 
to  infinity.  However,  in  our  stochastic  realization  setting  (3.20)  has  a  very 


natural  interpretation:  It  :  s  simply  the  maximum-variance  fov^ard  realica- 
tion  S*.  Ry  using  the  identity 

N*(t)  =  r,(t)_1x.(t)  (4.17) 

we  can  instead  write  the  smoothing  formula  (4.14)  in  terms  of  two  Kalnan-Bucy 
filters,  one  (5.2)  evolving  forward  and  the  other  (3.15)  evolving  backward  in 
time.  (Note  that  then  (4.14)  is  defined  on  the  whole  interval  [0,  T].)  This 
fact  was  pointed  out  in  [14,  15,  17],  in  which  papers  the  backward  estimate 

xb(t)  =  H{x(t) | H*(dy) }  (4.16) 

was  used  in  place  of  x+,  a  choice  that  may  at  first  sight  seem  more  natural. 

The  reader  should  however  note  that 

xb(t)  =  P(t)P'(t)_1x*(t)  (4.19) 

is  not  invariant  over  5  and  is  therefore  less  suitable  for  our  purposes. 

It  is  not  hard  to  see  that 

(Q*)"1  =  [(Q*)"1  +  P'1]P(P*)"1  (4.20) 

and  consequently  (4.14)  may  also  be  written 

x(t)  =  Z(t){Q,(t)-1x.(t)  +  [Q* (t) _1  +  P(t)_1]ib(t)},  (4.21) 

which  is  the  formula  presented  in  [14,  15,  17].  The  partitioned  smoothing 
formula  [12,  13]  also  can  be  seen  to  be  equivalent  to  (4.14),  and  it  can  be 
used  to  derive  all  the  equations  of  the  Mnyne-Fraser  procedure.  In  the  early 
papers  [7,  5],  relation  (4.14)  was  introduced  via  a  formula  [47]  for  optimal 
weighting  of  two  estimates  with  orthogonal  errors.  No  justification  of  this 
orthogonality  was  given  in  [S],  and  the  argument  in  [7]  is  incomplete  due  to 
problems  with  the  end  point  condition.  (A  more  satisfactory  treatment  has  re¬ 
cently  been  presented  in  [4S].)  However,  the  stochastic  realization  theory  pro¬ 
vides  a  natural  justification  of  this  procedure.  Indeed,  (3.25)  is  the  required 
orthogonality  condition. 
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The  smoothing  formulas  (4.7)  and  (4.14)  are  both  based  on  the  nonorthogonal 
decomposition  (3.29),  whereas  (3.13)  corresponds  to  the  orthogonal  decomposition 

*  Ht(xJ  •  Ht(z)  (4.22) 

(where,  in  either  case.  Lemma  3.7  justifies  the  restriction  to  the  finite 
dimensional  frame  space  H°) .  h'e  shall  now  take  a  closer  look  at  representa¬ 
tions  of  the  latter  type.  It  follows  from  (3.4)  and  (3.17)  that  z  as  defined 
by  (4.10)  is  the  solution  of 

dz  =  T#zdt  -  QC'R‘1/2dw*;  z(0)  =  x*(0),  (4.23) 

where  T*  is  the  feedback  matrix  (3.25)  of  the  Kalman-Bucy  filter  (3.2).  To 
see  this,  note  that  the  input  process  w*  of  the  maximum  variance  realization 
S*  is  related  to  the  innovation  process  w*  through  the  relation 

dw*  =  R'1/2Czdt  +  dw*  (4.24) 

-1/2 

and  that  B*  -  B„  =  -QCrR  .  We  shall  need  the  backward  counterpart  of 
(4.23).  Observing  that  Q  is  the  covariance  function  of  z,  Lemma  2.3  yields 
the  following  equation  for  z  =  Q~^z: 

dz  =  -r;zdt  -  C'R'1/2dw*;  z(T)  =  0,  (4.25) 

for,  in  view  of  (4.24),  is  the  backward  counterpart  of  w*  with  respect 
to  (4.23).  Note  that  z  is  defined  on  the  whole  interval  [0,  T] .  The  co- 
variance  matrix  Q  =  Q  *  of  z  satisfies 

Q  =  -r;Q  -  QF*  -  C'R_1C  ;  Q(T)  =  0.  (4.26) 

The  estimate  x  is  then  obtained  from  (4.13). 

THEOREM  4.2.  Let  x  be  the  state  process  of  an  arbitrary  realization  in 
S.  Then  the  smoothing  estimate  x(t)  satisfies 

x(t)  =  x»(t)  +  Q*(t) z(t)  (4.27) 
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for  all  t  c  [0,  T],  where  x*  is  given  by  (3.2)  and  z  by  (4.25)  and  (3.3). 
The  process  z  is  related  to  x*  and  x*  through  the  relation 

E(t)  =  Q(t) [x* (t)  -  x*(t)]  for  t  €  [0,  T).  (4.28) 

Relation  (4.27)  is  the  smoothing  formula  of  Bryson  and  Frazier  [2] .  (Also 
see  [3,  4]  and,  in  particular,  [9].)  What  is  new  here  is  its  interpretation 
(4.28)  in  terms  of  the  minimum  and  maximum  variance  realizations  S*  and  S* 
Theorem  4.2  can  also  be  regarded  as  a  generalization  of  a  result  presented  in 
[21],  and  the  basic  techniques  used  there  provide  an  alternative  approach  to 
deriving  the  above  result. 

COROLLARY  4.2.  The  smoothing  estimate  (4.27)  satisfies  the  stochastic 
differential  equation 

dx  =  Axdt  +  B( I  -  D'R_1D)B'zdt  +  BD'R_1(dy  -  Cxdt) .  .  (4.29) 

with  initial  condition  x(T)  =  x*(T).  If  S  e  S+,  z  can  be  replaced  by 
Q*^(x  -  x*)  in  (4.29). 

Proof.  Inserting  (3.2a),  (3. 2d)  and  (4.25)  into 

_  • 

dx  =  dx*  +  Q*dz  +  Q*zdt 

and  using  (3.2b)  yields  (4.29).  If  S  e  S+,  Q*1  exists  (Corollary  3.6.2), 
and  (4.27)  can  be  solved  for  z.  □ 

We  shall  now  study  two  different  special  cases  of  (4.29).  First,  let 
BD'  =  0;  this  is  a  standard  assumption  in  the  smoothing  literature.  Then  x 
is  differentiable,  and  (4.29)  reduces  to 

=  Ax  +  BB'z  ;  x(T)  =  x*(T) .  (4.30) 


For  realizations  S  e  S+  (4.30)  reduces  to  the  smoothing  formula  of  Rauch, 
Tung  and  Striebel  [3] 


x (T)  =  x*(T). 


~  =  Ax  *  BB'Q-1  (x  -  xj  ; 

C*  i 


(4.5]) 


Secondly,  assume  that  D  is  square.  Then  D  is  full  rank  and  D'R  D  =  1 


Hence 


dx  =  Axdt  +  BD  (dy  -  Cxdt)  ;  x(T)  =  x*(T), 


(4.52) 


which  defines  a  realization  in  S  .  Note  that  the  original  realization  S 

o 

need  not  be  internal;  it  may  have  an  initial  condition  x(0)  i  H(dy). 

The  problem  of  smoothing  can  be  regarded  as  that  of  finding  the  "internal 
part"  of  the  state  process.  Given  a  realization  S  e  S+  ,  we  shall 
next  look  at  the  structure  of  the  "external  part,"  i.e.,  the  smoothing  error 
x  :=  x  -  x.  To  this  end,  first  note  that,  given  a  realization  (1.1),  there 
exists  an  orthogonal  p*p-matrix  V(t)  for  each  t  e  [0,T]  such  that 


Vt)  (t)~ 

l/o  V(t),  ' 


D(t)J  I  R  '  “  (t)  0 


(4 . 53a) 


where  B^  is  n  x  m  and  B£  is  n  x  (p  -  m) .  Next 


=  V  dw 


(4.53b) 


define  a  pair  of  orthogonal  increment  processes  u  and  v,  of  dimensions  m 
and  p  -  ns  respectively.  Obviously  (4.33b)  satisfies  (1.2). 


THEOREM  4.3.  Let  x  be  the  state  process  of  a  realization  S  e  S+ 
and  let  B«,  and  v  be  defined  by  (4.33).  Then  the  smoothing  error  x  is 


given  by 


p(t)  =  Q*  (t)n(t) 

jdn  =  -TJndt  +  Ql]B?  d;  ;  n(T)  =  nT 


(4 .54a) 
(4 .54b) 
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where  =  Q* (T) fx(T)  -  x*(T)]  and  C  is  a  (p-m) -dimensional  orthogonal 
increment  process  of  type  (1.2)  such  that  H(d£)  x  H(dy) .  The  representation 
(4.54)  is  a  bad-sward  realization  in  the  sense  that  x  H(d£)  and  the  incre¬ 
ments  of  C  are  given  by 


dC  =  dv  -  B^Q**(x  -  x*)  dt.  (4.55) 

Proof.  Define  z*  :=  x  -  x*.  Replacing  B  dw  and  D  dw  in  (1.1)  by  B^du  + 

1/2 

B^dv  and  R  du  respectively  and  noting  that  the  innovation  process  w* 
in  (5.4)  is  given  by 

dw*  =  du  +  R“1/2Cz*  dt  (4.56) 

-1/2 

and  that  B  -  B*  =  -Q*C 'R  (Lemma  5.1),  it  is  just  a  matter  of  simple  cal¬ 
culations  to  see  that  z*  satisfies 

« 

dz*  =  T*z*  dt  -  Q^C'R"172  du  +  B2  dv;  z*(0)  =  £  . 

for,  since  S  £  5  ,  Q^t)'1  exists  for  all  t  e  [0,T]  (Corollary  5.6.2).  By 
+ 

Lemma  2.5  and  (4.56),  z*  =  Q^z*  satisfies  the  backward  Markovian  representation 
dz*  =  -r'z*  dt  -  C'R‘1/2  dw*  +  Q*XB2  d£  ;  z*(T)  =  nT  (4.37) 

where  t,  is  given  by  (4.35).  Since  H(d£)  l  H(dw*)  (by  construction)  and 

•  mx 

H(dw*)  =  H(dy) ,  H(d£)  x  H(dy)  as  required.  Now,  in  view  of  (4.27),  x  =  z*  - 

Q*z,  i.e.,  (4.54a)  holds  with  p  :=  z*  -  z.  Then  (4.34b)  follows  from  (4.25) 
and  (4.37).  □ 

As  a  corollary  we  see  that  the  state  process  of  any  realization  S  e  S+ 
can  be  decomposed  into  three  orthogonal  terms 


x(t)  =  x*(t)  +  Q*(t) z(t)  +  Q*(t)n(t), 


(4.38) 
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each  of  which  is  the  output  of  a  stochastic  system  whose  dynamical  behavior 
is  determined  by  the  function  This  is  seen  from  (4.25),  (4.34)  and  the 

fact  that  (3.2a)  can  be  written 

dx*  =  r*x*  dt  +  b*r'1/2  dy  ;  x*(o)  =  o.  (4.39) 

[Note  that  both  (4.25)  and  (4.34b)  are  backward  representations.  If  we  trans¬ 
form  to  the  forward  setting  the  systems  matrices  will  be  T*  rather  than  -F*.] 
The  internal  realizations  play  an  interesting  role  in  the  theory  of  smooth¬ 
ing.  These  are  precisely  the  representations  (1.1)  for  which  the  smoothing 
problem  is  trivial,  i.e.,  x  =  0.  The  next  theorem  shows  that  (subject  to  a 
mild  regularity  condition)  S  e  S  is  internal  if  and  only  if  =  0  and  the 
components  of  the  initial  conditions  of  (1.1)  belong  to  H(dy).  In  view  of 
Theorem  4.3,  this  is  to  be  expected. 

TbI 

THEOREM  4.4.  A  realization  S  e  5  sugh  that  has  full  rank  is  in¬ 

ternal  if  and  only  if  S  e  Sq. 

Proof.  We  only  need  to  prove  the  "only  if"  part;  the  "if"  part  was  proved 
in  Section  3.  Let  S  be  internal.  Since  the  condition  x(0)  e  H(dy)  holds 
trivially,  it  only  remains  to  show  that  B2>  as  defined  by  (4.33a),  is  identi¬ 
cally  zero.  In  view  of  the  fact  that  x  =  x,  comparing  (1.1)  and  (4.29)  shows 
that  the  identity 

B(I  -  D'R"1D)B,zdt  +  BD'R_1Ddw  =  Bdw  (4.40) 

must  hold.  It  is  not  hard  to  see  that  BD 'R  =  (B^,0)V  and 
B(I  -  D'R  ^B'  =  B2B^,  and  therefore  (4.40)  can  be  written 

B^zdt  =  B2dv 

which  cannot  hold  unless  B2  *  0.  Then  the  full  rank  condition  implies  that 
p  =  m.  0 
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FOOTNOTES 

1.  Some  of  these  shortcomings  have  been  pointed  out  in  a  recent  thesis  by 

Wall  [48],  brought  to  our  attention  after  the  submission  of  this  paper. 

2.  e.g.,  the  Moore-Tenrose  pseudo-inverse  can  be  used. 

3.  It  is  not  hard  to  see  that  the  concept  of  minimality  used  here  is  equiva¬ 
lent  to  assuming  both  that  (i)  the  input-output  map  of  (1.1a)  is  minimal 
and  that  (ii)  the  family  of  state  spaces  {Ht(x);t  e  [0 , T] }  is  minimal  in 
the  sense  of  the  geometric  state  space  theory  outlined  in  [27]. 


